Data Science Job Salaries - EDA¶

Introduction¶

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a broad range of application domains. Applications of data science range from domains including Healthcare, Gaming, Recommendation Systems, Logistics, Fraud Detection, Internet Search, Targeted Advertising, Speech Recognition, Airline Route Planning and many more. In today's world where data is increasing exponentionally, there is an urgent need to hire good data scientists, to derieve meaningful information for that huge sources of data.

In this exploratory data analysis, we Analize a dataset which consists of salaries of various Data Scientists ranging between year 2020-2022 across various Job titles and distinct geographical regions. We aim at derieving some information about how the features relate or correlate with the salary

About the Dataset¶

The dataset consists of 11 features with 606 candidate's salary estimated. The features used in the dataset are described below:

Columns description
work_year The year the salary was paid.
experience_level The experience level in the job during the year with the following possible values: EN Entry-level/ Junior MI Mid-level / Intermediate SE Senior-level / Expert EX Executive-level / Director
employment_type The type of employement for the role: PT Part-time FT Full-time CT Contract FL Freelance
job_title The role worked in during the year.
salary The total gross salary amount paid.
salary_currency The currency of the salary paid as an ISO 4217 currency code.
salaryinusd The salary in USD (FX rate divided by avg. USD rate for the respective year via fxdata.foorilla.com).
employee_residence Employee's primary country of residence in during the work year as an ISO 3166 country code.
remote_ratio The overall amount of work done remotely, possible values are as follows: 0 No remote work (less than 20%) 50 Partially remote 100 Fully remote (more than 80%)
company_location The country of the employer's main office or contracting branch as an ISO 3166 country code.
company_size The average number of people that worked for the company during the year: S less than 50 employees (small) M 50 to 250 employees (medium) L more than 250 employees (large)

Exploratory Data Analysis¶

Experience Level¶

There are 280 senior level experience (make up approximately 46% of the experience_level column), 213 Mid-level experience (make up approximately. 35% of the experience_level column), 88 Entry-Level experience (make up approximately 14.5% of the experience_level column), and 26 Executive-Level (make up approximately 4.3% of the experience_level column)

Employment Type¶

  • the full-time entry takes up roughly 97% of the employment type columns

Job Title¶

Questions:

  • How many unique job titles are contained in the dataframe?
  • What are the top 10 job tiles [as contained in the dataframe]?

There are a total of 50 unique job entries in the job_title column. And from the word cloud, base on the most frequent job titles in the dataframe, Data Scientist, Data Analyst, Data Engineer, Machine Learning Engineer, Data Science Manager, Data Analytics Manager, Big Data Engineer, Machine Learning Scientist, e.t.c... appear boldly that others [base on the frequency of occurence]. Let's further streamline the word cloud to get the top 10 job title.

the top 10 job titles, from the plot above, are: Data Science; Data Engineer; Data Analyst; Machine Learning Engineer; Research Scientist; Data Science Manager; Data Architect; Big Data Engineer; Machine Learning Scienctist; and Data Analytics Manager, respectively.

Company Size¶

Question:

  • Which of the company sizes have the highest number of job listings?

From the plot above, we see that Medium (M) size companies - companies with 50 to 250 employees - have the highest frequency in the distribution of company sizes [in the dataframe]. Followed by Large size companies - companies with more than 250 employees. And the least, Small (S) size companies - companies with less than 50 employees.

Employee_residence¶

let's investigate the distribution of employee's residence

Question:

  • what are the top 10 employees' residence?

From the plot above, we see that the top 10 employees' residents are: United States of America (US), Great Britain (GB), India (IN), Canada (CA), Germany (DE), France (FR), Spain (ES), Greece (GR), Japan (JP), and Pakistan (PK), respectively.

Company_location¶

Question:

  • what are the top 10 companies locate (country name) with respect to highest job listing?

The top 10 company location are: United States of America (US), Great Britain (GB), Canada (CA), Germany (DE), India (IN), France (FR), Spain (ES), Greece (GR), Japan (JP), and Austria (AT)

Work Year¶

  • How do the job listing differs from 2020 to 2022?
  • is there any trend from 2020 to 2022?

We see an upward trend in the number of jobs from 2020 to 2022; relatively increasing in percentages.

Salary In USD¶

Text(0.5, 1.0, 'Salary (usd) Distribution')

The salary (in usd) appears to be highly distributed around 63k - 150k, with a median salary of approximately 102k. Also, from the histogram, we see that the distribution is rightly skewed. From the box plot, we see that there exist salaries (in usd) above the upper fence (276k) of the plot. Let's further investigate these salaries.

Remote ratio¶

From the plot, we see that there are 380 jobs labeled Fully Remote (approx 63% of the Remote type); 130 jobs labeled Partially Remote (approx. 21% of the remote type); and 99 jobs labeled No Remote (approx. 16%)

Salary by Company size¶

Question:

  • What's the average salary in each of the company size?

We see that average salary differs accross different company size, from small to large. While [from the plots above] The average salary for a job listing in a small company size is approximately 776,300 USD, that of medium size company - 116,910 USD; and that of large company size - 119,240 USD, the median salary with respect to company size are 65,000 USD (for small company size), 113,188 USD (for medium size company), and 100,000 USD (for large company size).

While this is a pretty interesting insight concerning the mean and median salary according to company size, let's further investigate these salaries listing with respect experience level...

Looking at the plots, we see a progressing increase accrose the various experience level [from Entry_level/Junior to Executive_level/Director]; from the histogram-box plot, we see that the median salary for the various experience lever are:

  • Entry_level/Junior - approximately 56,500 USD;
  • Mid_level/Intermediate - approximately 76,940 USD;
  • Senior_level/Expert - apporoximately 135,500 USD;
  • Executive_level/Director - apporoximately 171,440 USD.

Also, the distribuiton for the various experience level is rightly skewed (contain probable outliers); This can be easily spotted as the dots on the right on each of the box plots, respectively.

Question:

  • Does the mean salary, according to experience level, differ for the three company size?

from the plot, we see a general upward movement of the mean salary from Entry_level/Junior to Executive_level/Director among the various company size except for the Small company size which have the Mid_level/Intermediate mean salary lower than the Entry_level/Junior level. This is quite unexpected. And will need to be investigate further.

Question:

  • What's the mean salary by remote type from 2020-2022?

Interestingly, between 2020 and 2022, we see that the Romote job listings has the highest mean salary; followed by the No-remote condition; and the least mean salary being that of Hybrid working-condition.

Question:

  • What are the average salaries of the top 10 job titles?
  • what are the top 10 job titles with the higest average salaries by experience level?

Although Data Science, Data Engineer and Data Analyst are the top 3 most frequent job titles, we can see that they are not the hightest paid among the top 10; Among the top 10 job titles [based on frequent occurence in the dataframe], the job titles with the highest mean salary are Director of Data Science - highest Avarage salary of 195,074 USD; followed by Data Architect - approximately 177,874 USD; Data Science Manager - approximately 158,329 USD.

Next, let's find out what the top 10 job titles with the highest mean salary by experience level are.

Although, Mid_level/Intermediate and Entry_level/Junior experience appeared in the plot, we see that the top 10 Avarage job title salaries mainly required Executive_level/Director experience and Senior_level/Expert experience.

Next, lets see what the median salaries are for the top 10 Entry_level/Junior level experience.

The top 10 mean salary plots shows us that the top three (3) job titles for the Entry-level/Junior level experience are not the highest paid job titles, rather Machine Learning Engineer, Research Scientist and Business Data Analyst are the three (3) job titles with the highest mean salaries.

Analysis Summary and Conclusion¶

Analysis on this data furnish us with informations concerning Data Science field:

  • Data Science is gaining traction and we saw an upward trend between the year 2020 and 2022
  • medium and large companies offers more salary than small company
  • Generally, Irrespective of the company size, the more experienced one is the higher the salary
  • Remote jobs are more common and offers more salary than No-remote, and Hybrid jobs. This may be as a result of the pandemic. but we can't ascertain that because the dataset contains no record before the pandemic
  • Among the top 10 most common job titles Data Architect, Machine Learning Scienctist, Data Science Manager are the top 3 highest paid job roles.
  • Among the top 10 most common job titles for Entry level Machine Learning Engineer, Research Scientist, and Business Data Analyst job roles offer the more salary.

In conclusion, pursuing a career in Data Science related jobs is a very good choice with tremendous opportunities for advancement in the future. Already, demand is high, salaries are competitive, and the perks are numerous. No wonder it is referred to as the "sexiest job of the 21st century" by Harvard Business Review in 2012

[NbConvertApp] Converting notebook Data_Science_Salaries.ipynb to slides
[NbConvertApp] Writing 4946430 bytes to Data_Science_Salaries.slides.html
[NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.5.0
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\Scripts\jupyter-nbconvert-script.py", line 10, in <module>
    sys.exit(main())
  File "C:\ProgramData\Anaconda3\lib\site-packages\jupyter_core\application.py", line 254, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\traitlets\config\application.py", line 976, in launch_instance
    app.start()
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\nbconvert\nbconvertapp.py", line 414, in start
    self.convert_notebooks()
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\nbconvert\nbconvertapp.py", line 588, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\nbconvert\nbconvertapp.py", line 555, in convert_single_notebook
    self.postprocess_single_notebook(write_results)
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\nbconvert\nbconvertapp.py", line 525, in postprocess_single_notebook
    self.postprocessor(write_results)
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\nbconvert\postprocessors\base.py", line 27, in __call__
    self.postprocess(input)
  File "C:\Users\PaulPlay\AppData\Roaming\Python\Python38\site-packages\nbconvert\postprocessors\serve.py", line 91, in postprocess
    http_server.listen(self.port, address=self.ip)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\tcpserver.py", line 151, in listen
    sockets = bind_sockets(port, address=address)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\netutil.py", line 161, in bind_sockets
    sock.bind(sockaddr)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted